*****************************************
* This file lists all of the relevant code and files for replicating the results from "The Global Distribution of College Graduate Quality"
*
* The raw data are available from Glassdoor. Included in this folder is the code that can be used to extract the data, as well as the Stata and Python files used to create the figures and tables. Files are listed in the orders in which they should be run.
*
* To replicate our results, we have streamlined the process so that only two programs need to be run.
*	STEP 1: Run the first program, Master_AB.py, implements the programs from Sections A and B. It requires Python.
*	STEP 2: Run the second program, Master_CD.do, implements the programs from Sections C and D. It requires Stata.
*
* In each file, "gdSeed" and "fileSeed" need to be changed to reflect the path where the Glassdoor extracted data and the Replication folder are stored, respectively.
*
* Note: Since the programs in Section A require direct access to where the Glassdoor database is stored, they are commented out in Master_AB.py.
* 	We have included these programs for transparency to see how the code by which we extracted our data from the Glassdoor database.
*	If one were granted access to the database, they would simply need to first save their credentials in a local file, config.yaml, and then they could run the programs in Section A. 
*
* Note: Researchers who wish to reproduce our results will need to acquire access to the Glassdoor data. They can request the data from Glassdoor's Economic Research Team (https://www.glassdoor.com/research/team/meet-the-team/).
*	The Glassdoor database is continuously changing as a result of users joining, providing new data, and updating data.  
*	If granted access to the raw data, conditioning on information available as of these dates using the code in Section A will lead to results that are similar but not the same as ours.
*	Our results use extracts of pay data as of July 13, 2022; educational resume data as of July 16, 2022; and resume work experience as of January 3, 2022.
*	Slight differences will arise because users can change information such as their resume and Glassdoor does not maintain old versions of such information.
*	Researchers who wish to reproduce our results exactly should provide us with a written statement from Glassdoor confirming access to the data. 
*	Upon such receipt, we will provide our extracts from Section A and then running Master_AB.py and Master_CD.do will replicate our results.
*
* Corresponding Author: Jason Sockin
* Email: Jason.sockin@gmail.com
*****************************************

*----------------------------------------------------
* Section A: Extracting the Glassdoor data
*----------------------------------------------------

A) R_extract_salaries.py
	
	Description: Extract user-submitted salary reports for all countries to Glassdoor.
	Output: International_salaries_07_13_2022.csv

B) R_extract_resumes_educ.py
	
	Description: Extract worker's educational histories provided in resumes to Glassdoor.
	Output: Resumes_educ_07_16_2022.csv

C) R_extract_resumes_workexp.py
	
	Description: Extract worker's work histories provided in resumes to Glassdoor.
	Output: Resumes_workexp_birthYear_STARTYEAR_ENDYEAR_01_03_2022.csv
 		Resumes_workexp_noBirthYear_01_03_2022.csv
 
*----------------------------------------------------
* Section B: Compiling the Glassdoor data
*----------------------------------------------------

A) R_create_clean_users_schools_majors.py
	
	Description: Uniformizes and assigns colleges, majors, and degrees to each worker based on their resume.
	Output: Cleaned_user_schools_majors.csv

B) R_create_resumes_work_exp.py
	
	Description: Determines whether the worker ever had a c-suite or founder job pased on their resume work experience.
	Output: User_founder_csuite.csv

C) R_create_US_scorecard.py
	
	Description: Reads in the US DoE's College Scorecard and collapse to college-major based on coarser majors for merging with Glassdoor.
	Output: US_scorecard_earnings.csv

D) R_create_salaries_international.py
	
	Description: Produces the main pay dataset by merging Glassdoor data with the cleaned resume data.
	Output: Salaries_international_dataset_main.csv

*----------------------------------------------------
* Section C: Intermediate Analysis with the Glassdoor data
*----------------------------------------------------

A) R_compare_US_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for U.S. college graduates employed in the U.S.
	Output:	Figure 1
		Selection_UnitedStates.csv

B) R_compare_UK_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for U.K. college graduates employed in the U.K.
	Output:	Figure C1p
		Selection_UnitedKingdom.csv

C) R_compare_Australian_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for Australian college graduates employed in Australia
	Output:	Figure C1a
		Selection_Australia.csv

D) R_compare_Nigeria_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for Nigerian college graduates employed in Nigeria
	Output:	Figure C1j
		Selection_Nigeria.csv

E) R_compare_Japan_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for Japanese college graduates employed in Japan
	Output:	Figure C1g
		Selection_Japan.csv

F) R_compare_Colombia_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for Colombian college graduates employed in Colombia
	Output:	Figure C1c
		Selection_Colombia.csv

G) R_compare_Poland_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for Polish college graduates employed in Poland
	Output:	Figure C1l
		Selection_Poland.csv

H) R_compare_Ireland_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for Irish college graduates employed in Ireland
	Output:	Figure C1e
		Selection_Ireland.csv

I) R_compare_Singapore_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for Singaporean college graduates employed in Singapore
	Output:	Figure C1m
		Selection_Singapore.csv

J) R_compare_India_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for Indian college graduates employed in India
	Output:	Figure C1d
		Selection_India.csv
	
K) R_compare_Philippines_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for Philipino college graduates employed in Philippines
	Output:	Figure C1k
		Selection_Philippines.csv

L) R_compare_SouthKorea_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for Korean college graduates employed in South Korean
	Output:	Figure C1o
		Selection_SouthKorea.csv

M) R_compare_Italian_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for Italian college graduates employed in Italy
	Output:	Figure C1f
		Selection_Italy.csv

N) R_compare_South_Africa_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for South African college graduates employed in South Africa
	Output:	Figure C1n
		Selection_SouthAfrica.csv

O) R_compare_China_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for Chinese college graduates employed in China
	Output:	Figure C1b
		Selection_China.csv

P) R_compare_New_Zealand_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for New Zealand college graduates employed in New Zealand
	Output:	Figure C1i
		Selection_NewZealand.csv

Q) R_compare_Netherlands_graduates.do

	Description: Estimates the degree of selection bias into the Glassdoor dataset for Netherlands college graduates employed in the Netherlands
	Output:	Figure C1h
		Selection_Netherlands.csv

R) R_list_keywords_majors_degrees.py

	Description: Produces lists of the majors and degree assignments used for uniformizing resumes that are reported in the Appendix.
	Output: Keywords_MAJOR.txt
		Keywords_DEGREE.txt

*----------------------------------------------------
* Section D: Main Analysis with the Glassdoor data
*----------------------------------------------------
		
A) R_regressions_school_quality_domestic.do

	Description: Estimates skill prices for each country and college graduate quality for each college, and then relates college graduate quality to development and notable accomplishments including entrepreneurship and top management, while also estimating the degree to which migrants are selected compared with non-migrants.
	Output:	Table 4 (vertically)
		Table 5
		Table 6
		Table 7
		Table 8
		Table B1
		Table B3
		Table B7
		Figure 2a, 2b
		Figure 3
		Figure 5a, 5b
		Figure 6
		Figure 7a, 7b
		Figure B1
		Figure B2
		Figure B3
		Figure B4
		
B) R_regressions_school_quality_alternative_truncation.do

	Description: Estimates college graduate quality under a different truncation for limiting the influence of low and high pay outliers.
	Output:	Estimates_z_c_altTrunc.dta
		Estimates_q_j_altTrunc.dta

C) R_regressions_school_quality_full_impute.do

	Description: Estimates college graduate quality assuming that workers with only a single college listed but no degree received their bachelor's degree there.
	Output:	Estimates_z_c_imputeMissing.dta
		Estimates_q_j_imputeMissing.dta

D) R_regressions_school_quality_sensitivity.do

	Description: Estimates college graduate quality under alternative modeling assumptions.
	Output:	Table 9 (separately for each ALTERNATIVE)
		Table 10 (vertically)
		Table B2
		Figure B6a
		Figure B6b
		Estimates_z_c_ALTERNATIVE.dta
		Estimates_q_j_ALTERNATIVE.dta

E) R_college_graduate_quality_distribution_relative_US.py

	Description: Reads in the estimates of college graduate quality and determines where each college would fall in the US distribution of college graduate quality.
	Output: School_q_j_distribution_relative_US.csv
	
F) R_plot_distributions_relative_to_US_quartiles.do

	Description: Looks at distribution of college graduate quality by country.
	Output:	Figure 4a
		Figure 4b
		Figure 4c

G) R_analyze_gpa.do

	Description: Relates undergraduate GPA to earnings.
	Output:	Table B8

H) R_regressions_school_quality_bootstrap.do

	Description: Estimates elasticity of college graduate quality with respect to GDP per worker using a bootstrap approach to estimating z_c (and thus q_j)
	Output:	Figure B5a
		Figure B5b

*----------------------------------------------------
* Section E: Summary Tables of Results
*----------------------------------------------------
		
A) Tables_MSS.xlsx
	
	Description: This excel file neatly organizes outputted statistics from the above programs into tables for the paper. Each tab refers to a table that is used in the paper or appendix.
	Output: Table 1
		Table 2
		Table 3
		Table B4
		Table B5
		Table B6
		Table B9
		

	


